Machine Learning introduction and how to create a basic production file

The main goal in this notebook is to show how you take your ML model from a basic jupyter notebook to production platform. By preparing this, you do not need to rely on any integration people touching your code before production, meaning a clean handover for deployment

The dataset can be found here

Credits: to one of my favourite data scientist - check out Mike's channel

Data Import / Setup

It shows that all of the columns are integers, expect for bare_nuclei, lets look deeper why. From the documentation this should be a dataset with only integers.

Since the data in the column includes an "?" it automatically set the dtype as an object.

We will inspect the data more using pandas profiling.

Update, added "?" as a missing value in the import

I find this profiling report to be very usefull for basic exploratory data analysis. It saves me for a lot of time, so just spend some time to get familiar with the data and look for data with missing quality expected for the model.

Anyway we have

Data cleansing

Missing values

Duplicates

Model prep

Production Model

Result image.png

Ready for production

Simulate the use of the model in production